Development and Optimization of a Selective Whole-Genome Amplification To Study Plasmodium ovale Spp.

ABSTRACT Since 2010, the human-infecting malaria parasite Plasmodium ovale spp. has been divided into two genetically distinct species, P. ovale wallikeri and P. ovale curtisi. In recent years, application of whole-genome sequencing (WGS) to P. ovale spp. allowed to get a better understanding of its evolutionary history and discover some specific genetic patterns. Nevertheless, WGS data from P. ovale spp. are still scarce due to several drawbacks, including a high level of human DNA contamination in blood samples, infections with commonly low parasite density, and the lack of robust in vitro culture. Here, we developed two selective whole-genome amplification (sWGA) protocols that were tested on six P. ovale wallikeri and five P. ovale curtisi mono-infection clinical samples. Blood leukodepletion by a cellulose-based filtration was used as the gold standard for intraspecies comparative genomics with sWGA. We also demonstrated the importance of genomic DNA preincubation with the endonuclease McrBC to optimize P. ovale spp. sWGA. We obtained high-quality WGS data with more than 80% of the genome covered by ≥5 reads for each sample and identified more than 5,000 unique single-nucleotide polymorphisms (SNPs) per species. We also identified some amino acid changes in pocdhfr and powdhfr for which similar mutations in P. falciparum and P. vivax are associated with pyrimethamine or cycloguanil resistance. In conclusion, we developed two sWGA protocols for P. ovale spp. WGS that will help to design much-needed large-scale P. ovale spp. population studies. IMPORTANCE Plasmodium ovale spp. has the ability to cause relapse, defined as recurring asexual parasitemia originating from liver-dormant forms. Whole-genome sequencing (WGS) data are of importance to identify putative molecular markers associated with relapse or other virulence mechanisms. Due to low parasitemia encountered in P. ovale spp. infections and no in vitro culture available, WGS of P. ovale spp. is challenging. Blood leukodepletion by filtration has been used, but no technique exists yet to increase the quantity of parasite DNA over human DNA when starting from genomic DNA extracted from whole blood. Here, we demonstrated that selective whole-genome amplification (sWGA) is an easy-to-use protocol to obtain high-quality WGS data for both P. ovale spp. species from unprocessed blood samples. The new method will facilitate P. ovale spp. population genomic studies.

The premise of this manuscript is great for malaria biology, where there is limited knowledge on the evolution of other species of human malaria parasites, especially P. malariae and P. ovale. In this manuscript, the authors present a method to specifically amplify the species of P. ovale towards future population genomics analyses that may be informative of its evolution and guide interventions for the complete elimination of all malaria parasites. The method they have presented, though not novel, is applied here to P. ovale species for the first time. There are a number of issues that need to be resolved; 1. The authors should elaborate more on how the background from the human genome and contamination with P. falciparum was assessed in the design of SWGA primers and their use. The only indication is that the SGWA primers for each species were designed, eliminating the other as background. So, were the primer sets specific per species. Can a gel showing no amplification for P. falciparum, humans, and the non-target species be shown? In natural infections, P. ovale is mostly seen as a co-infection with P. falciparum and sometimes with P. malariae too. So eliminating these would help to ensure that these primers can be used in the field. 2. The authors only used suppose monoinfections. Following from the above, these are rare, and larger genomic studies would need to deal with co-infecting Plasmodium species. No evidence on whether the short reads from these monoinfections can map to falciparum or malaria. This control against these other species will be clear evidence of specificity. 3. Controls were leukocyte depleted. It is not clear how these were chosen to be controls. Was SWGA also applied to these controls? To de determine the effectiveness of SWGA, SWGA and non-SWGA sequences from the same sample should be compared. 4. For others to use this protocol, it will be helpful to know how much DNA from controls and SWGA was used for library prep, in case these were not amplified. Was the sequencing library prep PCR free or PCR based 5. It is not clear if the genome coverage report in the main text is for all samples combined and if so, is 10x the mean of median coverage. Did this included coverage for the controls as well 6. From the scatter plot of parasitemia vs difference between SWGA and McrBc-SWGA, sample IDs could help with clarity 7. Considering that it is not clear if the short reads generated were mapped against P. falciparum arthologues of drug resistance genes, it is possible that any co-sequenced P. falciparum or P. malaria drug resistance targets would result in variants. As realtime PCR seems to indicate that these were monoinfections, the authors may have to discuss how this will be applied for wild isolates with contaminating coinfections 8. For the number of samples sequenced, true allele frequencies cannot be determined. If the frequencies reported were from vcftools, then the authors need to indicate that these were determined from read counts and not from consensus data. 9. For the total number of variants detected, the numbers are not clear. For example, 9,782 (3,326 per sample). This seems to be for 3 samples rather than the 5 samples sequenced. 10. Considering there are non-chromosomal contigs for P. ovale species, why were these not also used to map reads. Alternatively, the authors could enrich the manuscript by attempting de-novo assembly. 11. For the interest of the Plasmodium genomics community, genome-wide plots of heterozygosity would be informative, though this will be limited given the sample size.
Overall, the work presented has merits and can be improved. The discussion winds through a repeat of the results, rather than contestualising the outcomes of the work.

Preparing Revision Guidelines
To submit your modified manuscript, log onto the eJP submission site at https://spectrum.msubmit.net/cgi-bin/main.plex. Go to Author Tasks and click the appropriate manuscript title to begin the revision process. The information that you entered when you first submitted the paper will be displayed. Please update the information as necessary. Here are a few examples of required updates that authors must address: • Point-by-point responses to the issues raised by the reviewers in a file named "Response to Reviewers," NOT IN YOUR COVER LETTER. • Upload a compare copy of the manuscript (without figures) as a "Marked-Up Manuscript" file. • Each figure must be uploaded as a separate file, and any multipanel figures must be assembled into one file. For complete guidelines on revision requirements, please see the journal Submission and Review Process requirements at https://journals.asm.org/journal/Spectrum/submission-review-process. Submissions of a paper that does not conform to Microbiology Spectrum guidelines will delay acceptance of your manuscript. " Please return the manuscript within 60 days; if you cannot complete the modification within this time period, please contact me. If you do not wish to modify the manuscript and prefer to submit it to another journal, please notify me of your decision immediately so that the manuscript may be formally withdrawn from consideration by Microbiology Spectrum.
If your manuscript is accepted for publication, you will be contacted separately about payment when the proofs are issued; please follow the instructions in that e-mail. Arrangements for payment must be made before your article is published. For a complete list of Publication Fees, including supplemental material costs, please visit our website.
1 Dear Dr. Valentin Joste: Due to limited availability of reviewers, we were only able to secure one formal review. However, I have evaluated the manuscript and have provided some comments and would like these addressed in the resubmission.
-It would be informative to add a read coverage comparison between non-SWGA and SWGA samples. This would facilitate appreciation of the benefit for using the method.
Thanks for this suggestion. We did not perform non-sWGA sequencing to prove the benefits of using sWGA. In fact, several publications have already published data about the failure of next generation sequencing without preamplification or filtration (1)(2)(3)(4). In the figure below, published by Oyola et al (1), sequencing of Plasmodium falciparum clinical isolates without selective preamplification (corresponding to WGA in the figure) is not possible, with less than 5% of reads that mapped to P. falciparum. We considered as unnecessary to test the non-sWGA condition based on those previous observations. On the contrary, we compared the read coverage of the sWGA and leukodepleted samples (considered as controls) to validate the use of sWGA (see figure S3).
-Do you know the success rate for clinical sample amplification? Did you encounter any samples that failed? This information is useful for others planning to use this method.
We did not observe any failure in clinical sample amplification. But we did not test sampleswith parasite density below 1,790 parasites/µL for P. ovale curtisi and 198 p/µL for P. ovale wallikeri and we cannot predict the success of the amplification below those levels. This is now clearly state in the discussion lines 405. , ≥70% of the core nuclear 3D7 genome was covered at depth of ≥5× reads. However, the coverage dropped sharply for samples with parasitaemia below 0.005% ( Fig. 5a; see Additional file 1: Figure S2 for detailed coverage distribution). e same dataset was used to analyse coverage of known important drug resistant loci in the genome [22]. As shown in Fig. 5b, a similar coverage profile was observed where all the 7 specified drug resistant loci were covered 100% at depths of ≥5× reads for samples with parasitaemia ≥0.005%.

sWGA allows whole genome sequencing d from clinical dried blood spots
Having established sWGA efficacy in m ples, DBS field isolates collected from tw with a parasitaemia ranging from 0.001 11,125 parasites per 200 WBC or 40-3 per µl of blood) were used to test the me extracted from 205 DBS samples (avera SD 116.7), which were subsequently sub (average yield 1399 ng, SD 502). From t passed the threshold of 500 ng for libr and were, therefore, whole genome seque A total of 156 DBS samples were ana those with <50% of the core genome cov or less (N = 25). On average only 2.3% core genome of the 131 DBS samples wa all ( Fig. 6a), whereas 85% (SD 13) of the c covered at 5× or more (Fig. 6b). e me the core genome was 29× (Fig. 6c).
As expected, samples with higher para 0.1%) produced sequence data with at depths of ≥5×, whereas samples w lower than 0.03% had many positions co <5× (Fig. 7, F (1,150) = 135.5, p < 0.001). I samples with parasitaemia lower than had more than 50% of the genome cover ere was one exception; a sample with taemia had 51.4% of the core genome co samples with low parasitaemia had a muc tion of missing bases in the core genome age of genes that are either responsible f with, anti-malarial drug resistance (A Figure S3) were analysed, and a general ter coverage in samples of higher parasi was observed, while those with parasita 0.01% showed poor coverage across the g We consider that the best overall technique to amplify P. ovale spp clinical samples remains the leukodepletion because it provides homogeneity in reads mapping compared to sWGA. Therefore, leukodepletion allows read distribution-based analyses such as the measure of gene copy number variation (5), related to some drug resistance in Plasmodium (6).
In our lab, we used sWGA for samples that we could not filtered in those situations: -low volume of blood available (minimal red blood cells volume of 200 µL for leukodepletion); -retrospective study before the implementation of leukodepletion on fresh blood samples; We add those recommendations in the discussion, line 461 to 463.
-How many SNPs are called by both methods? How many overlap between the two methods? These metrics are important because you show in Figure 3 that read count is uneven following SWGA.
For Poc1, we respectively called 3,732 and 6,980 SNPs with the sWGA and the filtered approaches You are right, our data did not fit the assumptions of Pearson correlations test after Kolmogorov-Smirnov and Levene test use. We then performed the Spearman rank test to compare the NRAF of both methods.
We ponder the use of R 2 and decided to remove it. In fact, we did not try to establish a mathematical link between the NRAF in sWGA and leukodepletion but only to know if the NRAF (when NRAf > 0) of both methods are correlated.
We completed the statistical part of the method lines 252 to 263.
Minor points: 1. Legend of Supp fig 3: "red points a difference between 50 to 75%" should be "blue points a difference between 50 to 75%" Thanks for that remark, the modification has been made in the Figure S5.

Title: optimisation -> optimization
Thanks for that remark, the modification has been made.  Reviewer comments: Reviewer #1 (Comments for the Author): The premise of this manuscript is great for malaria biology, where there is limited knowledge on the evolution of other species of human malaria parasites, especially P. malariae and P. ovale. In this manuscript, the authors present a method to specifically amplify the species of P. ovale towards future population genomics analyses that may be informative of its evolution and guide interventions for the complete elimination of all malaria parasites. The method they have presented, though not novel, is applied here to P. ovale species for the first time. There are a number of issues that need to be resolved; 1. The authors should elaborate more on how the background from the human genome and contamination with P. falciparum was assessed in the design of SWGA primers and their use. The only indication is that the SGWA primers for each species were designed, eliminating the other as background. So, were the primer sets specific per species. Can a gel showing no amplification for P. falciparum, humans, and the non-target species be shown? In natural infections, P. ovale is mostly seen as a co-infection with P. falciparum and sometimes with P. malariae too. So eliminating these would help to ensure that these primers can be used in the field.
When designing primers with the sWGA software, the algorithm asks for a background genome. In our case of Plasmodium blood infection, we chose the human genome as background. We did not provide another Plasmodium genome as background (such as P. falciparum) because only one background genome could be use. The primers' sets were then not designed as specific to P. ovale spp over the other Plasmodium species but as specific over the human genome. We considered the main issue was the human DNA contamination and not the possible co-infecting Plasmodium species (although this could indeed be another issue The number of primers' binding is twice larger for P. ovale wallikeri or P. ovale curtisi compared to P. falciparum. Besides, the ratio s/n is twice higher for P. falciparum than P. ovale spp genome. The primers will preferentially bind P. ovale spp DNA over P. falciparum but I'm not sure it will be sufficient when P. falciparum had much higher parasitaemia than P. ovale spp. In our experience in imported malaria with qPCR data and as previously published (7) 2. The authors only used suppose monoinfections. Following from the above, these are rare, and larger genomic studies would need to deal with co-infecting Plasmodium species. No evidence on whether the short reads from these monoinfections can map to falciparum or malaria. This control against these other species will be clear evidence of specificity.
You are right. To overcome this hypothesis, we concatenate P. ovale curtisi or P. ovale wallikeri genome with P. falciparum genome (Pf3D7, PlasmoDB release 57) and P. malariae genome (PmUG01, PlasmoDB release 57) and aligned P. ovale spp sequencing reads against this new reference genome. As presented as an example on the plot below, P. ovale curtisi (Figure 2A) or P. ovale wallikeri ( Figure 2C) reads mapped in large majority to P. ovale spp genome and not to P. falciparum or P. malariae genomes. Besides, reads that mapped to P. falciparum or P. malariae genomes were of poorest quality ( Figure 2B and 2D) and of lowest insert size ( figure 3A and 3B). Figure 2A to 2D were added to the new submission (figure S2). The insert sizes displayed in the figure 3 represent the part of the paired-end reads that mapped to the reference. The lowest insert size that mapped to P. falciparum or P. malariae genome (∼20 bp, see figure 4) probably represents short consensus sequences between Plasmodium species.    3. Controls were leukocyte depleted. It is not clear how these were chosen to be controls. Was SWGA also applied to these controls? To de determine the effectiveness of SWGA, SWGA and non-SWGA sequences from the same sample should be compared.
We prospectively chose one P. ovale curtisi and one P. ovale wallikeri samples received in the French National Malaria Reference Center to be filtered (Poc1 and Pow1). We applied the sWGA to these leukodepleted controls (see table 1, figure 1B) to evaluate the effectiveness of sWGA. We compared the SNPs obtained by the two methods and obtained really closed NRAF (see figure S5).
We did not perform non-sWGA (without leukodepletion) sequencing to prove the benefits of using sWGA. In fact, several publications have already published data about the failure of Plasmodium sequencing without preamplification or filtration (1)(2)(3)(4). We considered as unnecessary to test the non-sWGA condition based on those previous observations. 4. For others to use this protocol, it will be helpful to know how much DNA from controls and SWGA was used for library prep, in case these were not amplified. Was the sequencing library prep PCR free or PCR based For the library, 250 ng of DNA was used when possible. For leukodepleted controls, DNA concentration was very low (0,452 ng/µL for Pow1 and 0,114 ng/µL for Poc1) and 50 µL was used (22,6 ng for Pow1 and 5,7 ng). We add those details in the methods section line 187. The sequencing library protocol was PCR-based.
Insert size according to IGV ( ̴ 20 bp) So9 clipped bases (bases in the 5' and 3' of the read that are not part of the alignment) Short alignment to P. falciparum Long alignment to P. ovale cur0si 5. It is not clear if the genome coverage report in the main text is for all samples combined and if so, is 10x the mean of median coverage. Did this included coverage for the controls as well The mean coverage reported in the main text (32x for P. ovale curtisi and 24x for P. ovale wallikeri) is for the sWGA method. For the sWGA + McrBC approach, the mean coverage is 62x for P. ovale curtisi and 83x for P. ovale wallikeri. Those results do not include the leukodepleted controls (93x for Poc1 and 99x for Pow1). We also compared between the different methods the percentage of the genome covered with at least 10x (see figure S3b).
6. From the scatter plot of parasitemia vs difference between SWGA and McrBc-SWGA, sample IDs could help with clarity We modified the figure with samples IDs.

7.
Considering that it is not clear if the short reads generated were mapped against P. falciparum orthologues of drug resistance genes, it is possible that any co-sequenced P. falciparum or P. malaria drug resistance targets would result in variants. As real-time PCR seems to indicate that these were monoinfections, the authors may have to discuss how this will be applied for wild isolates with contaminating coinfections Concatenating sequences with P. ovale curtisi or P. ovale wallikeri, P. falciparum and P. malariae resistance genes sequences clearly help to eliminate this hypothesis. We took fastq from previously published P. malariae (ERR4019168 (12)) or P. falciparum (ERR636035 (1)) data and aligned them against the concatenates resistances associated-genes sequences (PF3D7_0417200, PF3D7_0810800, PF3D7_0709000, PF3D7_0523000, PF3D7_1343700, PmUG01_05034700, PmUG01_14045500, PmUG01_01020700, PocGH01_05028400, PocGH01_14036800, PmUG01_10021600, PmUG01_12021200, PocGH01_01016900, PocGH01_10018700 and PocGH01_12019400).  -Coverage across reference and insert size across reference for ERR636035 on five genes of P. falciparum or P. ovale curtisi orthologous of known resistance genes. Plots were generated using Qualimap (11).
As presented in the figure 5 (for P. malariae) and figure 6 (for P. falciparum), no P. ovale curtisi genes were covered with P. malariae or P. falciparum reads. 8. For the number of samples sequenced, true allele frequencies cannot be determined. If the frequencies reported were from vcftools, then the authors need to indicate that these were determined from read counts and not from consensus data.
I do not calculate allelic frequencies in this study.  9. For the total number of variants detected, the numbers are not clear. For example, 9,782 (3,326 per sample). This seems to be for 3 samples rather than the 5 samples sequenced.
The total number of variants detected is the total of unique SNPs detected, not the sum of the SNPs detected in each sample. If two samples displayed the same SNP, it counts for one and not two SNP. We rephrase the sentences for clarification line 362. 10. Considering there are non-chromosomal contigs for P. ovale species, why were these not also used to map reads. Alternatively, the authors could enrich the manuscript by attempting de-novo assembly.
We actually mapped the reads against chromosomes and contigs of the reference genome and the mapping data we presented are against the whole genome. But we only used the reconstructed chromosomes for SNPs calling because the contigs are mainly composed of pir gene of P. ovale spp. Those pir genes are highly variables (such as var genes in P. falciparum) and is the largest Plasmodium multigene family (13). Due to their high variability among clinical Plasmodium isolates, alignment is not sufficient to obtain high quality data. Local reconstructions, such as previously published for var genes (14), are necessary. No script is actually available to easily reconstruct P. ovale spp pir genes and we decided to not analyze the SNPs obtained on those contigs.
We agree that de-novo assembly would have been of great interest to improve the data. Unfortunately, we do not have the capacity of performing such analysis in our lab.
11. For the interest of the Plasmodium genomics community, genome-wide plots of heterozygosity would be informative, though this will be limited given the sample size.
We add the NRAF profiles as well as the genome-wide plots of the P. ovale spp isolates in the figure S4a and S4b as previously done by Pearson et al (15). Percentage of heterozygote calls were low for both species (<0,1%). We saw less heterozygosity for P. ovale curtisi, maybe linked to the lower depth of coverage compared to P. ovale wallikeri (figure S3 in the Supplemental material). NRAF plots were quite difficult to interpret in the absence of any reference for P. ovale spp.
The heterozygous SNPs we see may be related to background noises du to imperfect reconstructed reference genome or misalignment of reads due to paralogous regions (15). More genomic data are needed to: -improve the reference genome, -compute hyperheterozygosity score and used it in the variant filtering approach such as described for P. vivax (15).
12. Overall, the work presented has merits and can be improved. The discussion winds through a repeat of the results, rather than contextualizing the outcomes of the work.
We modified the discussion. We appreciate that the authors addressed all of the reviewer comments in the revised manuscript. Overall-all concerns have been addressed. I am recommending to accept the manuscript, however the data on the cross-mapping of reads at resistance loci (between Pf-Po and Pm-Po) that was included in the response to the reviewers needs to be included and referenced in the final version of the manuscript. This data impacts the use of the method with mixed infection samples, an important application of the method.
As you will see your paper is very close to acceptance. Please modify the manuscript along the lines I have recommended. As these revisions are quite minor, I expect that you should be able to turn in the revised paper in less than 2 weeks, if not sooner.
When submitting the revised version of your paper, please provide (1) point-by-point responses to the issues I raised in your cover letter, and (2) a PDF file that indicates the changes from the original submission (by highlighting or underlining the changes) as file type "Marked Up Manuscript -For Review Only". Please use this link to submit your revised manuscript. Detailed instructions on submitting your revised paper are below.

Link Not Available
Thank you for the privilege of reviewing your work. Below you will find instructions from the Microbiology Spectrum editorial office and comments generated during the review.
The ASM Journals program strives for constant improvement in our submission and publication process. Please tell us how we can improve your experience by taking this quick Author Survey. Sincerely,

Jennifer Guler
Editor, Microbiology Spectrum

Preparing Revision Guidelines
To submit your modified manuscript, log onto the eJP submission site at https://spectrum.msubmit.net/cgi-bin/main.plex. Go to Author Tasks and click the appropriate manuscript title to begin the revision process. The information that you entered when you first submitted the paper will be displayed. Please update the information as necessary.
Here are a few examples of required updates that authors must address: • point-by-point responses to the issues I raised in your cover letter • Upload a compare copy of the manuscript (without figures) as a "Marked-Up Manuscript" file.
• Each figure must be uploaded as a separate file, and any multipanel figures must be assembled into one file.